Picture for Kunyu Shi

Kunyu Shi

SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

Add code
Feb 05, 2026
Viaarxiv icon

Enhancing Vision-Language Pre-training with Rich Supervisions

Add code
Mar 05, 2024
Figure 1 for Enhancing Vision-Language Pre-training with Rich Supervisions
Figure 2 for Enhancing Vision-Language Pre-training with Rich Supervisions
Figure 3 for Enhancing Vision-Language Pre-training with Rich Supervisions
Figure 4 for Enhancing Vision-Language Pre-training with Rich Supervisions
Viaarxiv icon

Non-autoregressive Sequence-to-Sequence Vision-Language Models

Add code
Mar 04, 2024
Figure 1 for Non-autoregressive Sequence-to-Sequence Vision-Language Models
Figure 2 for Non-autoregressive Sequence-to-Sequence Vision-Language Models
Figure 3 for Non-autoregressive Sequence-to-Sequence Vision-Language Models
Figure 4 for Non-autoregressive Sequence-to-Sequence Vision-Language Models
Viaarxiv icon

Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts

Add code
May 11, 2023
Viaarxiv icon